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Abstract. We analyze large systems of interacting proteins, using techniques from the 
non-equilibrium statistical mechanics of disordered many-particle systems. Apart from 
protein production and removal, the most relevant microscopic processes in the proteome are 
q ' complex formation and dissociation, and the microscopic degrees of freedom are the evolving 

concentrations of unbound proteins (in multiple post-translational states) and of protein 
complexes. Here we only include dimer-complexes, for mathematical simplicity, and we draw the 
network that describes which proteins are reaction partners from an ensemble of random graphs 
with an arbitrary degree distribution. We show how generating functional analysis methods can 
be used successfully to derive closed equations for dynamical order parameters, representing 
an exact macroscopic description of the complex formation and dissociation dynamics in the 
infinite system limit. We end this paper with a discussion of the possible routes towards solving 
the nontrivial order parameter equations, either exactly (in specific limits) or approximately. 
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' 1. Introduction 

It is safe to say that biomedicine is undergoing a profound and irreversible change, brought 
about by what some coined a 'data tsunami'. A spectacular increase of both the quality and 
quantity of data on cellular signalling, molecular structure and gene expressions, including in 
vivo data, have prompted biomedicine to transform rapidly into a much more quantitative as 
opposed to descriptive science, and theory now lags behind experiment to an almost embarrassing 
extent. The biological systems thought to be responsible for generating these data tend to be 
complex and nonlinear, involving many variables and nested processes acting on widely separated 
timescales. Also, biology lacks many of the simplifying features of many large physical systems, 
such as detailed balance, the identical nature of the degrees of freedom, or translation invariance. 
In biological systems, the interacting objects and their mutual forces 9X6 clS 9 rule non-identical; 
in biology this would be called 'inhomogeneity', where in physics one would speak of 'disorder'. 

One particular biological system of great relevance to medicine is the proteome, the collection 
of about 2.10 4 cellular proteins and their molecular complexes, which constitute the main work 
force of the cell, responsible for executing most of the tasks that allow it to function, and for inter- 
and intra-cellular communication. The vast increase of proteomic data allow us to refine our 
understanding of its pathways and reactions, but we are running into new barriers: 'The most 
significant challenges that the mechanistic modelling might be facing are the lack of quantitative 
kinetic data and the combinatorial increase in the number of emerging distinct species and 



states of the protein network being simulated' pQ. Our aim in this paper is to contribute 
to our quantitative understanding of the proteome, by exploring the potential of quantitative 
macroscopic regularities emerging in very large as opposed to small proteomic systems. To do 
so we construct dynamical single-compartment models of large systems of interacting proteins, 
and analyze these in the limit of an infinite number of reaction partners, adapting techniques 
from the non-equilibrium statistical mechanics of disordered physical many-particle systems. 



2. Model definitions 

We imagine a cell in which N different proteins can be expressed, labeled by Roman indices 
i G {1, . . . , N}. Each protein can be in at most q post-translational states, labeled by Greek 
indices a G {1, . . . ,q}. The concentration in the cell of (i,a), the a-th post-translational state 
of protein i, will be written as xf . There are in principle ^N(N + 1) possible complexes that 
these proteins could form by pairwise binding, ^N(N — 1) hetero-dimers and N homo-dimers. 
We denote the complex where i binds to j as (ixj), and the concentration in the cell of the 
complex (ixj) For mathematical convenience we can always take 

2.1. Elementary processes and dynamical equations 

We aim to construct and analyze single-compartment type dynamical equations for the 
concentrations {xf} of unbound proteins and {xij} of their binary complexes, that capture 
the following elementary proteomic processes: 

elementary process: symbolic notation: process rate: 

„a3+„3 



binary complex formation: (i, a) + (j, j3) — > (ixj) k^ xfx\j 
binary complex dissociation: - ► (i, ol) + (j, (3) 

protein degradation/removal: (i,ce) — > 

protein synthesis: — > (i, a) 



ij X ij 



All rate parameters ,9f} are by definition non-negative. Of these parameters only 

{if >0f} are allowed to depend on time, in order to incorporate possible changes in membrane 
properties and gene expression levels; the complex formation/dissociation parameters {kf^}, 
in contrast, depend only on the structural characteristics of the reaction partners, via the indices 
(i, a) and (j, j3). Consistency demands that kfj = k^ . Not all pairs (i,j) can in practice form 
a stable complex (ixj), so we need (fixed) variables c^- G {0, 1} to define which are potential 
binding partners: if (ixj) is possible we put Cjj = Cji = 1, otherwise Cjj = Cji = 0. The non-directed 
graph c = {cij} thus defines the cell's protein interaction network. 

Combining the four elementary processes above with the structural information in c regarding 
binding partners, we are then led to the following set of Michaelis-Menten reaction equations: 
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*f = E^Ei^'^-k^xfx^+er-^x? (1) 

3 13 



fa Xi 3 ~ Ci -i ^ foij X i X j ^if Xi i\ (2) 
a/3 

If Cij = one will indeed have x^ = at all times, as one should. Equations (|l|2p satisfy the 
relevant mass conservation constraints: the total amount of any protein i (whether bound in 
dimers or in unbound form) changes only due to production/degradation imbalance, viz. 



^[E< + E^-] = £W-«) (3) 

a j a. 



The presence of concentrations X{j of complexes, with their two protein indices, could complicate 
a direct statistical mechanical analysis of equations (11)21) , However, the are seen to obey 
linear equations, which we can simply solve: 

= ^E< + f dse- k ^x?(s)x$(s) (4) 

with the short-hand k^ = J2a/3 ^if~- ^ we substitute (jlj) into equations ([I]) we obtain 

±xf(t) = £ % |d S X!^A(*-s|kiiK(«)^(s) + ^(t)-7f(*K(t) (5) 

with a partially retarded effective free protein interaction kernel 

W a , pX (r\k) = F A + [£ V#-e[ T ] e ->°-T - 8 ap 5(r) 

P 

which obeys X^/dr W a - p \(T\k) = 0, for all (p, A). 

The above definitions involve several simplifications, the main ones being that the only 
complexes are dimers, and that we disregard all spatial variations of protein and complex 
concentrations. However, we would argue that our definitions incorporate the main events in the 
proteome, while still allowing (as we will see) for an exact generating functional analysis in the 
limit N — > oo (under suitable conditions on the statistical properties of the model parameters). 



(6) 



2.2. Statistics of microscopic parameters - interaction network and reaction rates 
It is not possible at present to solve equations such as ([5]) analytically for arbitrary choices 
of the microscopic model parameters {(%j,kfj , Of, 7"}. Neither can we benefit from built-in 
parameter regularities or symmetries (in contrast to e.g. physical systems on lattices). This 
leaves the route of disordered systems theory, where one exploits the property that large many- 
particle systems often exhibit universal behaviour of macroscopic observables, whose values will 
in the infinite system size limit depend only on the statistics of the microscopic parameters rather 
than their precise realization. It follows that these values can then be calculated by performing 
suitable averages over all microscopic parameter realizations with the correct statistics. 

Thus we take the parameters {cij,kfj } to be generated randomly from some appropriate 
distribution, which should ideally incorporate as much of our available biological information 
as possible. We will in this paper draw the protein interaction network c at random from 
an ensemble of random graphs with prescribed degrees (k\, . . . , fcjv)j that are in turn drawn 
randomly from an as yet unspecified degree distribution p(k): 

p ( c ) = In^.E^^-nh^.i+a-co^o] (7) 

i i 

The self-connections ca are included to allow for homo-dimers (with cq € [0, 1] representing the 
fraction of proteins that can form homo-dimers), (k) = ^ZkP{k)k > gives the average number 
of hetero-dimer binding partners per protein, and Z is a normalization constant. In (JT]) all 
graphs that exhibit the enforced degree sequence (fci, . . . , fcjv) are generated equally likely. 

We similarly draw for each index pair (i,j) with % < j the complex formation/dissociation 
rates kjj = {k^j 3 ^ 1 } randomly and independently from a joint distribution P(k) (we can use the 
same symbol P here as in ([7]), as the arguments will prevent ambiguity). Averages over -P(k) will 
be written as Jdk P(k)/(k) = (/(k))k- Although the rates for reactions involving one protein 



pair with % < j are now statistically independent from those involving any other pair (k,£) 
with k < £, we do not assume statistical independence at the level of post-translational states 
or off-rates versus on-rates, i.e. we do not assume that -P(k) = Y\ a p[P{k a ^ + )P{k al3 ~)}. There 
are only two internal symmetries we have to insist on. First, when referring to homo-dimers we 
must have k a ^ = k^ aziz , for reasons of consistency. Second, in the case of hetero-dimers, if we 
define S*k via (S\t) a P ziz = £r a± then we will require that -P(S'k) = -P(k) for all k, for technical 
reasons that will become clear later. This implies that the probability of assigning specific on/off 
rates to any reaction (i, a) + (j, /3) «-> ixj is the same as the probability to find these rates for 
the reaction (i,(3) + (j, a) <-> ixj, but not that the actual rates are themselves identical. We 
note that, although the number q of potential post-translational states labeled by a = 1 . . . q 
is the same for all proteins, the actual number of post-translations states can be made to vary 
from one protein to another, by appropriate choices for the rate statistics -P(k). 

The production and decay rates (#f,7f) need not be drawn randomly for our methods to 
apply (as these have only a single protein index i), although one could do so for mathematical 
convenience. Here we will allow the rates (#f,7f) to be time-dependent, to incorporate the 
effects of proteome perturbations or receptor triggering. If each protein is always assembled in 
one unique conformation, we must choose the production rates such that for each i only one 
index a can have Of ^ (representing the protein's native state). 

3. Generating functional analysis 

In the spirit of [2] we define a generating functional that allows us to calculate the time-dependent 
statistics of unbound protein concentrations; from these follow, via (jl]), also those of all protein 
complexes. This functional Z[t/)] is a straightforward generalization to time dependent stochastic 
variables of the conventional moment generating function of random variables. 



3.1. The disorder-averaged generating functional 

We first discretize time according to t{ = iA, with an elementary time step A that will be sent 
to zero, and use 5- functions to enforce at each time the validity of the equations ©: 

iat 

x [] S[xf(t+ A) - xf(t) - A( £ ajjds J2 W a;pX (t-s\k ij )x p i (s)x^s) + 9f(t)- 1 r(t)xf(t 

iat j p\ 



nr dxf(t)dxf(t) iA ^a (i)a .a (t)+ixf (t) (xf (t+A)-xf (t)-A[e? (i)-7? (t)xf (i)] ) 

tat 



.e 



S[c,{k}] 



(8) 



in which only the following object depends on the network c and the process rates {k}: 
H[c,{k}] = -iA^Ci^E^W/^^^-^NKW^W 

ij apX t 
— i ^ ' (Hi^ii (k-ii) i ^ \ Cij^ijO^-ij) 



with 



3«(k) = ^JdtdTW a .&{T\k)%(t)x?(t-T)x*(t-T) 
ap\ 

3y(k) = JdtdT^2kP x+ xe(t-T)aj(t-T) 



(9) 



(10) 



:{6[r]e- k -^ ka ^^nt)+^(t)]-5(r)[x^t)+x x (t)]} 

a/3 



(11) 



To arrive at (|9ll0llip we have used k^^ = ^ Q± - Note that Z[0] = 1, by construction. 

We next average the generating functional over the disorder, i.e. over the randomly generated 
interaction networks ([7]) and reaction rates {1%}. The result will be written as Z[ip]. The idea 
behind definition ([8]) is that calculating Z[ip] gives us access to averages of observables that 
evolve according to the equations © without actually solving these equations, e.g. 

i d 



xf(t) = - lim lim — - — r Z[ib] (12) 
xf(t)x?(t>) = - lim lim —r 3 Z[i/>] (13) 



The disorder occurs only in H[c, {k}], so we have to calculate expH[c, {k}]. For the present 
reaction rates, drawn independently for each with i < j, we may use the fact that in ((7|) 
the homo-dimer entries cu are statistically independent of those representing hetero-dimers: 



e H[c,{k}] _ e -iJ2 l c " E "( k "^ . <> 'Hi .'-J' -^'j'' 

We tackle the two factors separately, using the fact that the relevant matrices have entries 
Hij(kjj) = O(N ). The first factor gives a simple expression which factorizes over the proteins: 



e-iEi^S"^) = Tj[i_ Co + Co ( e - i3 «( k )) k ] (15) 

i 

The second factor contains the main complications. To average over the random interaction 
networks we first use the fact that a mathematically equivalent way to write (J7]) is 

i i i<j 

The reason is that the extra Poissonnian factor can be written in terms of the average degree 
(k) = N^ 1 ki (and hence absorbed in the normalization constant Z), via the identity 

n[f^.+(i-f)M = (it) 



(since ki = Yl,j^i c ii f° r an h by virtue of the constrained network degrees). We use integral 
representations to implement the degree constraints, i.e. 

ik.e^, =n£t i ^ i(ki ^ iCij) = £u[^ iki V iEi< ^ i+u;j) 

In combination these ingredients allow us to write 

= \ f II t^*] II. [l + ^(e-^+^e-^V - l)" 

= I rTjr^ e ^He^^ [e_iK+W ' )<e ^ (k)>k - 11+O(iV0) (19) 



We may now use ^,jj(k) — Sjj(iS'k) (with the previously introduced post-translational state swap 
operator S defined via (S~k) al3± = k^ a± ) and the convention that P(Sk) = P(k), which allows 
us to write (e _1=i:I ^)k = (e~ 1= ^ 5k ))k = (e - ^*^)!^ and symmetrize the summation Y,i<j : 



e -iE, <3 c^ lj{ k l3 ) = ii!!! f 1 jj ^^^LiE^-'^^'ie-^^v+o^) (20) 



Obtaining full factorization over protein variables in Z[ij>], in leading order in N, requires finding 
a way to disentangle the different protein variables that appear in the quantity 

_L^ e -i(o; i +^) (e -i Si ,(k) )k = (21) 

ij 

AT2" 



3. 2. Introduction of dynamical order parameters 

To achieve factorization we introduce an appropriate dynamical order parameter and isolate the 
joint distribution of unbound protein concentration 'paths' {xf} = {xf (i)} (with t € K), for 
each of their post-translational variants, and their conjugate concentration paths: 

P[{z,z}|{x,x},c] = ^l[S[{x a } - {xf}]S[{x a } - {xf}]e~^ (22) 

i a 

With this macroscopic object we can write, upon defining the short-hand {dx} = Ylta^ x a(t) 
and sending A — > wherever feasible, 



^ e -i(^+u^ e -iH i3 (k)^ = f {d x didx'di'}P[{x,x}|{x,x},cj]P[{a;',f'}|{x,x}, ( 

ij J 

iJdtdTj2 pX ke x +x p (t~T)x' x (t-T){elT}c- k --J2 a0 ka ^l^ (23) 



To transform Z[tp] into an expression that can be evaluated for N — > oo by steepest descent, we 
have to relocate the kernels . . .], via the insertion (for each joint path {x,x}) of 

1 = JdP[{x,x}]5[P[{x,x}]-P[{x,x}\{xi,&}M 

_ f dP[{x,x}]dP[{x,x}} ANP\{x,x}]\P\{x,x}]-P\ix,x}tfx,x},lV]\ ( 0A \ 

~ J 2tt/N [ZV 

After appropriate re-scaling of the conjugate kernels P[{x, x}] (involving factors N and A) in 
order to obtain well-defined functional integrals with the correct iV-scaling, we then arrive at 



glE,/ | ("*+'*i)<e i3 ^ (k) ) k = J^ d p d py^f{dxdx}Pl{x,x}]Pl{x,x}]J^ ^[{x^}}^ 

i 

x exp ji(A;)iV J{dxdxdx'dx'}P[{x,x}}P[{x',x'}} (25) 
r i / d ' dT E M ^ + ^(*- T ) x l(*- T ){ fl [ T ] e "*" T E Q / a ' 5 1^(t)+^W]-^T)[ipW+^(t)]} 



x(e 



Only the last factor of the first line contains the protein variables {xi, Xi, Wj}, and in a factorized 
form. This allows us to combine our intermediate results (|8|9|10)1 1)15)20)251) into 

~ZW\ = {dPdP} exp [0(N ) + \N J {dxdx}P[{x, x}]P[{x, x}] - ^N(k) 
+-N(k) f {dxdxdx'dx'}P[{x, x}]P[{x', x'}] 



2 

x FJ ^ e iwk * J{dxdx} jf^^i (*)*«(*)+* /d*E«Mt)[&*«(t)-«f W+^f (*)*»<*)] 

xe -iP[{x,£}]e-- Jl_ Cq + CQ^-iE^pA J dMr WaipA(T|k)*a(*)*p(*-T)*A(t-T)^ k j | (26) 

Now we can for N — > oo carry out the functional integrations over {P, P} by steepest descent. 
Upon defining P[{x,x}] = iQ[{x,x}] the outcome takes the form 

lim 1 log ZW] = extr {P)Q} {tf [{P, Q}] + d»[{P}] + n[{Q}} + const} (27) 



where the constant follows from the identity logZ[0] = 0, and with 

*[{P,Q}] = - J{dxdx}P[{x,x}]Q[{x,x}]-^(k) (28) 
$[{P}] = -(k)f{dxdxdx'dx'}P[{x,x}]P[{x',x'}]x (29) 

^-i J dtdTj2 pX kP^+x p (t-T)x' x (t~r){e[T]c- k -- ^^fc^-[^(t)+^(t)]~5(r)[£ p (t)+^(t)]}^ 



fi[{Q}] = <J log ^ — e iwk [{dxdx} e'^'So ka(*)*a(t)+*a(t)(^aa(t)-*a(t)+7a(t)*a(*))] 



7T 27T 

Q[{x,i}]e- i " ( / f ,-isE ap A/ dMT H/ a;pA(T|k)£ a (t)a;p(t-T)a; A (t-r)\ ) x /^q\ 



/k, S /{^,e, 7 ,fc} 

with the short-hands (/[{^ a , 6^, 7 Q }, fe]){^,e, 7 ,jfc} = linijv-»oo N' 1 J2i f[{4>i ! , , 7f }> and with 
{f{s)) s = co/(l) + (1 — co)/(0). The random variable s controls whether (s = 1) or not (s = 0) a 
given protein species can form homo-dimers. The functional saddle-point equations that define 
the extremum in (I27p . and are fully closed by definition, are the following: 

MP,Q}]+ sn J ^M {P}] = o (3i) 



5P[{x,x}} 11 '^ JJ 5P[{x,x}} 

] MP, Q}\ + x ^f mt O[{Q}] = (32) 



From now on we analyze equations (|31I32|) only for -0 = 0, the generating fields "0 will be used 
in due course only to identify the physical meaning of our order parameters. 



3.3. Saddle point equations 

For N — > oo the dynamical order parameters P[{x,x}] and take well-defined values 

that depend only on the statistical properties of the microscopic system parameters, and which 



follow upon solving the coupled equations (131)320 . Working out these fully exact formulae gives: 
Q[{x,x}} = (k)f{dx'dx'}P[{x',x'}]x (33) 

^-i / dtdTj2 pX k pX + X p (t~T)x> x (t-T){8{T]c- k -- ^^-[x Q (t)+^(t)]-5(T)[i,tt)+^tt)]}j 

r ^ iu;(fe-l)+Q[{x,£}]e-- M rr f g }1 

L1 ' 11 \ /{di'df} ^^e'^+e^'i'}] 6 "'^^', f, fl, 7 }] /f«,7fl V ' 

in which 

M[{x X 9 7}] = / e i / d *S c « £a W[^ :CQ (*)~ 6 ' a W + '>' c '( i ) :rc 'W~ s SpA/ d ' r W a . pX {r\W)x p (t-T)x x {t-r)\ 



(35) 

The identity (2-7r) -1 dw expfiwA; + ze~ ltJ ] = z k /k\ allows us to simplify (|34p to 

Ll ' /J \/{dx'dx'}Q fc [{x',x'}]M[{x',f' ,0,7}]/{0,7,fc} 1 ; 

It seems appropriate at this stage to switch from -P[{x, x}] to its partial Fourier transform, which 
will lead to equations involving only real-valued order parameters that can be interpreted (as 
will be shown) as conditional path probabilities in the sense suggested by our adopted notation: 

= /{dic}P[{a:,i}]e- i / dt S«, »-(*)*«*(*) (37) 

Adopting this new definition converts our closed equations (|33|36p into 

m r ; ,r X] I k /{dxjQ^ 1 [{x, x}]M[{x, x,9+y, 7}] v 

[X nXVn \ f{dx'dx'}Q k [{x>,x>}]M[{x',x',0,j}} /{6,iM 1 ' 



Q[{x,x}] = (k) /"{da/dy}^^}^^- 1 ^^ 

Y[d[y a (t) - J dr^2W a . pX (r\Sk)x' p (t-T)x x (t-T)]) k (39) 



x 

at p\ 



with the previously introduced short-hand (Sk) a/3:iz = fc^ a± , and where we have used the 
following identity which follows trivially from definition (|35p . 



M[{x,x,e,7}]e- i f dt ^c.v°®*°®=M[{x t x,9+y,7}] (40) 



To work out equations (|38|39p further we need to evaluate the following integrals, for k > 0: 
{dx}Q k [{x,x}]M[{x,x,0, 7 }] 

= C(k) k ( J] [{dx £ dya^[{x,}|{^}](ll^KW - fdrJ2W a;pX (r\Sk £ )x ep (t-T)x x (t-T) 

£<k Cod p\ 

at 

|dTx p (t-r)(^ a;pA (r|k)x A (t-r)+^W a;pA (r|k,)x, A (t-r))]) kski ^ (41) 
pA £ 

Here C is a constant, which will drop out of our equations. Further simplification of equations 
(I38|39p requires knowing the physical interpretation of the order parameter W[{x}|{y}]. 



3.4- Physical meaning of order parameters 

We can infer the meaning of {y}], starting from (|22|37[) and using the manipulations 

applied to (jSj), which effectively boils down to making in (I26p the substitution 



e'/^E.oCW^W ^ _L Y i e- i ^l[[d[{x a }-{x^}}e- i I dt ^ t ^] (42) 

3 a 

Carrying out this substitution shows that insertion of the factor exp[— iuj] into (|26|) is equivalent 
to replacing kj — > kj — 1, and that insertion of the factor exp[— i J dt y a (t)x^(t)] is equivalent to 
replacing 6J(t) — > + y a (t)- We may therefore conclude that 

1 



W[MIM] = -{zj}])!*,,.^.-!, r ( ^r (i)+!/a ( ( ) vo ( 43 ) 

3 

where the brackets (...) denote evaluation of the argument for the microscopic process ([S]). 
H^[{x}|{y}] is apparently a generalized response function. It gives the probability that if we 
pick at random a protein species j, remove one randomly selected binding partner from it and 
increase instead its production rates by (time-dependent) amounts {y a }, we will observe 
for the post-translational states of that protein the concentration evolution Similarly, 

D[{x}\{y}] = ^ E WW - {vMofW^fW+v-W Va (44) 

3 

(which quantifies the response to perturbation of the actual original system, without the node 
removals). Since the calculation of (|44j) differs from that of (143)) only in the removal of the 
factors exp[— icjj] from definition (j22[) . tracing the differences between the two calculations is 
easy. One finds that it boils down to replacing exp[iu>(k — 1)] by expfiw/j] in (f34"l) . and hence 
replacing kQ k x}] by Q fc [{x,x}] in ([58]) . which leads us directly to 

0\{x\\{y\\ - ^m{ J{dx , dmk[{x , ;&m (45) 

Both ^[{^llly}] and -D[{x}|{y}] are conditional measures in the paths {x}, so they must obey 
J{dx} W[{x}\{y}] = J{dx} D[{x}\{y}} = 1 for all {y}. After solving W[{a:}|{2/}] from ([38l39]) . 
one obtains via (|45|) . and extracts -D[{x}|{0}] as the measure for the concentrations 

of unbound proteins in the unperturbed system. It follows from (|43|) . together with the causality 
of our equations © and the fact that we have discretized our microscopic laws according to the 
Ito convention dx(t) = x(t + dt) —x(t), that each x a (t) in can depend only on those 

yp(t') that have t' < t (to be precise: that have t' < t — A before the A — > limit). 

3. 5. Implications of causality 

Returning to (HTj) . we see that causality allows us to integrate (j4"Tj) over {x}. We first discretize 
time and integrate over the x a (t) with t = t max , followed by integration over the x' la {t max ) and 
the 2/fo(i m ax) (in that order). The net result is the same expression we started out with, but 
with t max replaced by i max — A. Repetition of the argument then leads to the simple result 

J{dxdx} Q k [{x,x}]M[{x,x,e 7l }} = C{k) k (46) 

Integration of (|38|) over {x} reproduces the correct normalization /{dx}W[{:z;}|{y}] = 1. We 
insert relation (|46|) into (|38p . which thereby simplifies to 

W[{x}\{y}} = E4|r fmQ k -H{x,£}](M[{x,x,e+ yn }}) (47) 



Here (. . -}{e^\k} denotes averaging over the statistics of production and decay rates of those 
proteins that have k binding partners. Finally we insert (I4ip to eliminate the auxiliary kernel 
<5[{x, x}], and use Ct = k\Co. This brings us after some simple re-arrangements to a transparent 
closed equation that involves the physical kernel VF[{:r}|{y}] only. This equation, exact in the 
limit N — > oo, is the final result of the generating functional analysis: 

W[{x}\{y}] = E4|r 

k>i { ' 

x (( / II | {dx,dya^[{xai{ya]n4^W-/ dr E^;pA(T|5k,)^ p (t-T)x A (t-r) 
J Kk { at J p\ 

l[5 \dx a (t) - dt\e a (t) + y a (t) - -f a (t)x a (t) (48) 



x 
at 



+ E/ dr x p {t-T)(s 




k, S ;k 1 ...k fc _ 1 /{0, 7 |fe} 



pX £<k 

From the solution of (|48|) then follows D[{x}|{y}], defined in (|44|) . via equation (|45|) . giving 
D[{x}\{y}] = 



k>0 



(<k I at J p\ 



X 

at 



5 l dx a (t) - dt[e a (t) + y a (t) - -y„(t)x a (t) (49) 

at { 

+ E/ dT X p {t-T)(s W a - p \(T\k)x\(t-T) + '^ W a -px(T\k e )Xix(t-T) 



pX l<k 



/k,s;ki...k fc /{fl,7|Jfc} 



Although different at the level of mathematical details, the structure of the above equations 
is very similar to what was found in non-equilibrium statistical mechanical studies of spin 
models on finitely connected random graphs, such as O HI EJ E]. In the special case of 
protein interaction networks with Poissonian degree distributions, i.e. p(k) = e~ c c k /k\, where 
p{k + \)(k + \) / (k) = p(k) for all k, a simple transformation k — > k + 1 shows that the kernels 
P[{x}|{y}] and -D[{x}|{y}] will be identical. In all other cases this will not be true. 

3. 6. Cavity interpretation of the order parameter equations 

One can understand our results (|48|49|) at a more intuitive level, starting from equations (0). 
Any protein species i interacts with the proteome only via its direct partners, the species in the 
set di = {l\ en = 1}. Given the paths {xi} taken by the concentrations of the species £ € <9j, the 
path {xi} taken by i could in principle be calculated (modulo non-invertibility issues) by solving 
a linear equation, giving an expression of the form {xj} = J-i[{xi},l£di\. This is illustrated in 
figure QJ A], where site i is shown in black for an example with fcj = 4. Next we imagine changing 
the structure of the graph locally, by removing the information flow from i to di, see figure QJB] 
where the protein species in di are drawn as ©. We compensate for this intervention, however, 
by adjusting the production rates of all j € di according to 0J(t) — > 0J{t) + ?/"({xj, Xj}, t) such 
that we do not change any of the concentration paths in the system, which requires 

yf({xi, Xj },t) = fdsJ2W a;p x(t-s\k ie )xf(s)x P 1 (s) (50) 

P x 





§£ ef+yf({xi, Xj }) 
j 




§£0?+I/?(Ka:i}) 

J 



+ 2/n ({^j) ^n}) 



Figure 1. Interpretation of our order parameter equations. Top left [A]: section of the original 
interaction network, showing a site % which has four interaction partners (these define the set 
di). Top right [B]: a locally modified network where information flow from i — > d% is prohibited 
(so node i is effectively removed), but the production rates of all j € di (i.e. of all sites marked 
as ©) are adjusted in compensation, so that none of the concentrations in the system change, 
following (|50p . Bottom [C]: result of a further modification, where also the information flow 
from j — > dj is prohibited (so also node j is effectively removed), but again compensated for by 
appropriate adjustment of the production rates of the sites in dj = {m,n}. 



Hence, with the conventions di = {jn, ■ ■ ■ ,jiki} an d -D[{x}] = -D[{x}|{0}] we may write 
1 



D i{x}] = Jim ^^[M-^fe,...,^ }]] 

iV— S>00 iV 



jvSioJV? J n {{ dx e} 6 l{ x e}-{ x 3a}}} 5[{x}-Fil{ Xl ,...,x ki }}] 

i 1=1 



/ A [i dx ^ 6 ^ X ^~i X Ju}]i removed, {0 iu +y Ju 
i 1=1 
k 

1 f t 



({x,a; £ })} 



i £=1 



^[{^-^[{xi,...,^}]] 
(51) 



X II removed, {e Hl + yi }^[{y£} ~ {Vj u (i X > »/})}]] *[ M [{ X l, ■ ■ ■ , a*J]] 

If the original network had no loops, then in the new 'cavity' graph the branches attached to sites 
£ S di are disconnected, and each will behave as an independent graph, for large N topologically 
equivalent to the original. It follows that in the cavity graph we can treat the paths {x^} with 
£ € di as statistically independent, and assume they have identical statistical properties, so 



D[{x}] = i^/n^^^E^}-^}] 



i removed, {8j+yi} 



(52) 



i £=1 



i8[{x}-J 7 i[{x 1 ,...,x k .}]] n 5[{y £ } - {y ju ({x,x e })}] 



e<ki 



k 

= ^ ^ E /ft [{dxidye}W[{x £ }\{y,}]] (53) 

i t=\ 

x 8[{x}-J 7 i [{xi,...,x ki }\] Yl S[{yt}-{yj u ({x,xi})}] 

t<ki 

One will now be led to equation (I49p . upon simply adding {y} to the production rates {6i} 
(which brings us from D[{a;}] to -D[{x}|{y}]) and with the last two lines of (|49|) representing the 
more explicit representation of our symbolic expression <5[{x}— ^[{xi, . . . , a^}]]. 

What remains is to understand the origin of (|48jl . which requires an expression for the 
concentration path statistics of nodes in d{ , such as j in figure [HB] • We repeat the process of 
blocking the information flow away from the node of which we try to calculate the concentration 
paths, while compensating the production rates of its partners appropriately. Now this means 
removing j, and adjusting the production rates of the nodes in dj (except for i, which has been 
removed already); see figure QJC]. The equation for W[{a;}|{y}] is therefore nearly identical to 
that of D[{x}|{y}], with two differences: first, since we are looking strictly at nodes j that were 
connected to a (now removed) cavity node i, these nodes j no longer have in-degree statistics 
p(k), and second, only kj — 1 of the original kj partners of each j contribute to {xj}. Since our 
random graph ensemble has no degree-degree correlations, the modified degree statistics p(k) of 
the nodes j G d% that were initially attached to (randomly drawn) 'cavity' sites i follows from 

p(k) = lim gggfe = hm -£-5>. k = 4^ (54) 

We can thus obtain our equation for VFflxjljy}] by making in the right-hand side of (|4"9"]) the 
replacements p(k) — > p(k)k/{k), Ylt,<k ~^ T\t<ki an d Y^e<k ~^ J2e<k- The result is indeed ([15]) . 

It is satisfactory that we have been able to use generating functional analysis (GFA) to obtain 
(|48l49p . which is more precise and direct than the above reasoning and did not require the strict 
absence of loops (in effect, GFA confirms that for N — > oo any loops generated in ([7]) have 
vanishing impact on the process). Second, as soon as we use more complicated ensembles than 
([7]), e.g. those with degree-degree correlations as in |7J, or introduce correlations between the 
reaction rates of distinct protein pairs, the above simple arguments would become prohibitively 
messy, whereas the GFA route should in principle remain open. 



4. Solution of the macroscopic equations 

The focus must now turn to solving equation ([48 j) for W[{x}|{y}], from which D[{x}|{y}] follows 
via (|49|). This is a highly nontrivial problem, on which progress has so far been slow; not just 
here, but in all GFA studies of processes on finitely connected graphs [3J E[ E]. The possible 
routes to be explored come, roughly, in four areas. This paper is only the first step in a research 
programme, establishing 'proof of principle' that GFA methods can indeed be used to study the 
proteome, hence in view of space limitations we will here only comment briefly on each area: 



4-1. Numerical solution of the macroscopic laws 

Due to the exponential increase with time of the number of macroscopic observables concerned, 
even in studies with discrete time and discrete variables [3] [5] [6] , numerical solution of equations 
such as (I48p was possible only for a small number of time steps. In the present problem, where 
the arguments of VF[{x}|{y}] are continuous paths, numerical solution is not a realistic option. 



4-2. Working out solutions in specific simplifying limits 

The two main limits where analytical solution is possible are low connectivity and high 
connectivity (where (k) —>■ oo must be preceded by a suitable re-scaling k^ —>■ k^/ (k) of 
all on-rates). If we assume our network has no disconnected nodes, then the lowest overall 
connectivity is found for p(k) = S^i, where we find (|48|49|) reducing to 



W[{x}\{y}} = ((l[5{dx a (t)-dt[e a (t) + y a (t)- 7a (t)x a (t) (55) 

at 

+ £/dr x p (t-r)(.W a!AA (r|k)x A (t-r))]}) k \ 

pA 1 

D[{x}\{y}] = (/{dxV}W[{x , }|{y7]n<^W-/ d ^E^;pA(^|5k / )4(t-r)x A (t-r)" 

J at J p\ 

x(j[5{dx a (t) - dt[e a (t) + y a (t) - -f a {t)x a {t) (56) 

at 

+ T,j dT Xp{t-T)^sW a]pX {T\k)x X (t-T) + W a ,p X (T\k e )x' x (t-T))]} 



k,s;k'/{6»,7} 
p/\ 

These equations are easily interpreted (all cavity sites are now isolated) and converted into 
a pair of coupled stochastic differential equations, whose statistics represent the diversity of 
concentration paths in the original iV-protein system. For large connectivity we first re-scale the 
reaction on-rates according to k a @ = k a @ + /(k). One now finds that Z?[{x}] = D[{x}|{0}] = 
P^[{x}|{0}] obeys a closed equation as (k) — » oo, which using the law of large numbers becomes 



D[{x}} = (Y[6[dx a (t)-dtU a (t)-<y a (t)x a (t) + 8Y, /dr W pX (r\k)x p (t -T)x\(t-r) 

at ^ pA J 

+ Jdr Wp X (r\Sk)x x (t-r) J{dx'} D[{x'}]x' p (t-r))] ) g (57) 

pA 

Again the problem can be converted into a relatively simple stochastic equation. The protein- 
protein interaction occurs in a 'mean field' way. In fact for (k) — > oo we can extract from (I57p 
a closed equation for the disorder-averaged path {X} = j{dx}D[{x}]{x}, rather than a closed 
theory in the language of correlation- and response functions (see e.g. [8]) which one would 
have found upon taking the diverging connectivity limit in models of disordered spin systems or 
neural networks on finitely connected random graphs. 

4-3. Constructing ad-hoc approximations 

Various approximations could be considered. First, one could solve (|48|) iteratively, generating 
a sequence of measures W n [{2;}|{y}] by substitution for each n of W n [{x}|{2/}] in the right-hand 
side of PH|) and defining the left-hand side as W n +i[{x}|{y}]. Upon starting e.g. from the trivial 
^[{^lliy}] = <5[{x}], one would find VTi[{x}|{y}] describing a dimers-only system of proteins, 
and Wi [{x} \ {y}] describing interacting proteins but without an Onsager reaction term, etc. An 
alternative would be a linear response approximation, based on truncating the exact expansion 

(58) 



Each such approximation will have specific strengths and weaknesses, which will depend on the 
statistical characteristics (topology, strength of interactions) of the system being studied. 



4-4- Probing further the mathematical structure of the macroscopic laws 

An obvious question to be investigated is whether in stationary states there exist exact closed 
laws for a reduced set of static order parameters (this may require the equivalent of detailed 
balance, e.g. demanding that 5k = k for all reaction rates). The natural candidates would 
be the probability densities W(x|y) for asymptotic time-averages x of protein concentrations, 
given asymptotic time averages y of production perturbations (as opposed to paths). 

One could also try to exploit the origin of (|48|) as a saddle-point equation for a functional. 
Upon applying the various transformations of order parameters directly to the function 
^[{-P, Q}] + ^[{i 3 }] + in (|27|) . with an inverse Fourier transform for Q, one obtains 



£[{V,W}} 



{dxdy} W[{x}\{y}]V[{y}\{x}} 



1 



with a kernel Ml. . 



+- /{dxdydx'dy'} W[{x}\{y}} M[{x, y};{x', y'}\ W[{x'}\{y'}} 
p(k) 



log /{d*}/n [{« v\{yz}\{*}] 

J J Kk 



{x}-F[{0+J2ye};s,k\ 

£<k 



(59) 
k,s/{e,j\k} 



that contains all the information on protein-protein interaction, 



M[{x,y};{x',y'}] 



I[5[ya(t)- 

at 



dr E W a ; P x(r\k)x p (t-T)x' x (t- 



n^^(t)-/dr^Vr Q;pA (r|5k)^(t-r)x A (t-r) 

at J p\ 



and where s, k] denotes the solution of the equation 

Oait) -7a(*)x Q (t) +sJ2 J dr W a]p x(T\k)Xp(t-T)x X (t- 



d , \ 
dt^ 



(60) 



(61) 



pX 



One's instinct next would be to attempt a variational formulation: to define on the basis of 
(I59p a function £[{W}] whose minimum would give the true solution. It turns out that this 
will never be possible here, due to an intriguing property of f)59[) : one can show that for all 
functions that obey causality, the surface £[{V, W}] is no longer dependent upon whether or not 
the proteins interact. The conclusion must be that all attempts to convert (|59p into a variational 
problem, suitable for generating variational approximations of W [{x}\{y}], are doomed from the 
start, since variation within any subset of causal measures will at most give us information on 
VF[{x}|{y}] that would be true irrespective of whether or not proteins form hetero-dimers. 
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